CHAPTER 3: Labels & Addresses & Variables

  Okay, let's get to variables. In previous chapter i wrote that variable is
  general term for space which stores some value. Registers are variables for
  example. But there is limited number of registers (VERY limited, some 8 + few
  special), and this is nearly always not enough. For this reason memory (RAM -
  random access memory) is used.
  
  NOTE: when someone says "variable" he almost always means memory variable.
  
CHAPTER 3.1: Labels  
  
  Problem is that you have to know WHERE in memory is some value stored.
  Position in memory (called "address") is given by number. But it is quite hard
  to remember this number (address) for every variable.

  >>> term: address (number which gives position in memory)

  Another problem with addresses is that when you change your program, address
  can be changed too, and so you would have to correct this number everywhere
  where it is used. For this reason addresses are represented by "labels".
  Label is just some word (not string, it is not enclosed in apostrophes),
  which, in your program, represents address in memory. When you compile your
  program, compiler will replace label with proper address. Label consists of
  alphabet characters ("a" to "z", "A" to "Z") numbers ("0" to "9"), underscores
  ("_") and dots ("."). But first character of label can't be number or dot.
  Label also can't have same name as directive or instruction (instruction
  mnemonics). Labels are case sensitive in FASM ('a' is NOT same as 'A').
  Example of labels:
    "name"        is label
    "a"           is label
    "A"           is label, different from "a"
    "name2"       is label
    "name.NAME2"  is label
    "name._NAME2" is label
    "_name"       is label
    "_"           is label
    ".name"       is not label, because is starts with dot (labels starting with
		  dot have special meaning in FASM, which you will learn later)
    "1"           is not label because it starts with number
    "1st_name"    is not label for same reason
    "name1 name2" is not label, because it contains space
    "mov"         is not label, because "mov" is instruction mnemonics

  >>> term: label
  
  You can define label using directive "label". This directive should be
  followed with label itself (label name). For example:
    label name  is label definition, it defines label "label"
    label _name is label definition, it defines label "_name"
    label label is not label definition, because "label" can't be name of label
                as decribed in previous paragraph
  this will define label that will represent address of data defined behind it

  >>> term: label definition
  >>> directive: label

  Shorter way to define label is just writing label name followed by colon (":")
    name:
    _name:
  but we'll use this form later.

  >>> label definition using ":"

CHAPTER 3.2: Variable definition

  Now how we can return to problem with variables: how to define variable in
  memory. Program you create (compiled program, in machine code) is loaded to
  memory at execution time, where processor executes it instruction by
  instruction. Look at this program:
    org 256
    mov al,10
    db 'this is a string'
    int 20h
  This program will probably crash, because after processor executes "mov al,10"
  then it reches string. But in program there is no difference between string
  and instructions in machine code. Both are translated into array of numeric
  values (bytes). There is no way processor can differ whether numeric value is
  translation of string or translation of instruction. In this example,
  processor will execute instruction whose numeric representation (in machine
  code) is same as ASCII representation of string "this is a string".

  Now look at this:
    org 256
    mov al,10
    int 20h
    db 'this is a string'
  This program will not crash, because before reaching bytes defined by string
  processor reaches instruction "int 20h", which ends execution of program. So
  bytes defined with string will not be executed, it will just take some space.
  This is way how you can define variable - define some data at place where
  processor won't try to execute it (behind "int 20h" in this case).

  So code with byte-sized variable of value 105
    org 256
    mov al,10
    int 20h
    db 105
  Last line defines byte variable of size 105. 

  Now how to access variable? First we must know address of variable. For this
  we can use label (described above, reread it if you forget)
    org 256
    mov al,10
    int 20h
    label my_first_variable
    db 105
  So we already know address of variable. It is represented by label
  "my_first_variable". Now how to access it? You may think it is, for example
    mov al,my_first_variable
  but no! Remember i told that label ("my_first_variable" in this case) stands
  for address of variable. So this instruction will move address of variable to 
  "al" register, not variable's contents. To access contents of variable (or any
  memory location) you must enclose it's address in brackets ("[" and "]"). So
  to access contents of our variable, and copy it's value to "al" we use
    mov al,[my_first_variable]

  Now we will define two variables:
    org 256
    <some instructions>
    int 20h
    label variable1
    db 100
    label variable2
    db 200
  So to copy value of "variable1" to "al" we use
    mov al,[variable1]
  To copy "al" to "variable1" use
    mov [variable1],al
  To set value of "variable1" (exact: to set value of variable which is stored
  at address represented by "variable1") to 10 we could try
    mov [variable1],10
  but this will cause error (try it if you want). Problem is that you know that
  you are changing variable at address "variable1" to "10". But what is size of
  variable? In previous two cases byte-size could be determined because you
  used "al" register which is byte sized, so compiler decided that variable at
  "variable1" is byte sized too. But in this case, value 10 can be of any size,
  so it can't decide size of memory variable. To solve this we use "size
  operators". We will talk about two size operators for now: "byte" and "word".
  You can put size operator before instruction operand when accessing it to let
  compiler know what the variable size is:
    mov byte [variable1],10
  Another way to make this is
    mov [variable1], byte 10
  in this case compiler knows that moved value 10 is byte sized so it decides
  that variable is byte-sized too.

  But it would be hard to always remember and always write size of variable when
  you access it. For this reason you can assign size of variable to label when
  you define it. Just write size operator behind label name in definition:
    label variable1 byte
    db 100
  or
    label variable1 word
    dw 1000
  now "mov [variable1],10" will work, in first case it will store value 10 to
  byte at address "variable1", in second case it will store to word.

  >>> term: size operator

  NOTE: You can't move value between variables with different size:
    mov byte [variable1], word 10
  or
    mov [variable1],al
    ...
    label variable1 word
    dw 0

  NOTE: You can't access two memory locations in one instruction (except for
  same special instructions). This is wrong, it won't be compiled:
    mov [variable1],[variable2]
  use this:
    mov al,[variable2]
    mov [variable1],al
  This will cause you some problems in the beginning but it will force you
  to write faster code, and that is biggest reason to code assembly.

  NOTE: size operator assigned to label at definition has lower priority than
  size operator before access to variable in instruction, so:
    mov byte [variable],10
    label variable word
    dw 0
  will access BYTE, while
    mov [variable],10
  will access WORD

  I think you noticed that having two lines to define one variable is little too
  much. There is a shorter way to define variable:
    variable1 db 100
  is same as
    label variable1 byte
    db 100
  notice that size of variable is defined too. It can be used with words too
    variable 2 dw 100

  Some example of using variables:
    mov ah,2
    mov dl,[character_to_write]
    int 21h
    int 20h
    character_to_write db 'a'

CHAPTER 3.3: Addresses and basics of segmentation

  Now we will discuss addresses little more. I have told that address is number
  (!) which gives some position in memory. You have learnt how to represent this
  number with labels, so numeric addresses were maintained by compiler. But you
  still don't know anything about format of this number. I will try to explain
  it in this chapter.

  As you probably know, data in memory are stored in "bits" which can have
  value 0 or 1. You can consider memory as a (one dimensional) array of bits. 8
  consectutive bits make one byte.  Address is number (index, position in
  array) of byte. For example address "0" is address of first bit of memory (or
  address of first byte), address "1" is address of eight bit (or address of
  second byte) of memory etc.

  Address in .COM files is word-sized number, so
    label var1
    <some data>
    mov al,var1
  is wrong. It may work if "var1" is lesser than 256 so it fits into byte sized
  register, but in general store addresses in word-sized variables, we will talk
  about them little later.

  Now some examples on addresses. Check this file
    label variable1
    db 10
    label variable2
    db 20
    label variable3
    db 30
  here address represented by "variable1" is 0, "variable2" stands for 1,
  "variable3" is 2.

  OK, this looks nice but it is not true at all. Problem is that there are
  usually more programs loaded in memory at same time (operating system, mouse
  driver, you program etc.). When using this way, program would have to know
  WHERE in memory will it be loaded so it can access it's variables. For this
  reason addresses are "relative". It means that for every program that is 
  loaded is reserved some region in memory called "segment". All addresses in
  memory accessed by this program is then relative to begginning of this area.
  So [0] doesn't mean first byte of memory, but first byte of segment.
  >>> term: segment

  How this works? Processor has few special registers (segment registers) which
  holds address of segment (address of first byte of segment). Every time you
  access memory in your program then contents of this segment register is added
  to address given by you so "mov al,[0]" accesses first byte of your segment.

  NOTE: I have told that memory addresses in .COM programs are words. That means
    they can be in range 0 to 65535. So maximal size of one segment is 65536
    bytes. This can be "tricked" by changing contents of segment registers, but
    don't care about this now.

  NOTE: Segment is region in memory. But term "segment" is often used for
    address of beginning of this region. Sad but true.

  So absolute address in memory has two parts: segment (exact: address of
  beggining of segment) and second part, word sized value called "offset" which
  is address relative to segment (address of beginning of segment).
  >>> term: offset

  IMPORTANT NOTE: I said labels represent address of variable. In fact, labels
    in FASM represent offset of variable. That is why it called "flat" (you will
    comprehend this later (much much later :))

  I won't get deeper into segment registers, how is address of begginning of
  segment stored in them (there IS difference), take segment registers as some
  kind of black box for now, it works and we can ignore it now.

CHAPTER 3.4: 'org' directive explained
  
  As your program is loaded, it often needs some external info from program that
  runned it. Best example is command line arguments, or it may need know WHO
  runned him etc. This value must be, of course, stored in same segment in
  program. In .COM files these data (passed to your program by program that
  runned you) is stored in first 256 bytes of segment. So your program is loaded
  from offset 256.

  NOTE: 256 byte structure in beginning of .COM file is called "PSP" which
    stands for "program segment prefix"

  Now imagine this .COM program:
    mov al,[variable1]
    int 20h
    variable1 db 0
  (notice - no "org 256" directive). Instruction "mov al,[variable1]" takes 3
  bytes, "int 20h" takes 2 bytes, so "variable1" will stand for offset 5. So
  instruction "mov al,[variable1]" is "mov al,[5]". So this instruction access
  6th byte of segment (first byte is at offset 0). But I already told you that
  in first 256 bytes of segment are stored some informations, and your program
  is loaded behind them, from offset 256. So you don't want to "variable1" to be
  5, you want it to be 256+5. And this is what "org" directive does. It sets
  "origin" of file addresses. "org 256" will tell FASM to add 256 to offset held
  by every label defined behind this directive (before next "org" directive).
  And this is exactly what we want in .COM files.

  So code upwards won't access variable you want, it will access something in
  PSP (first 256 bytes of segment). To make it work properly use:
    org 256
    mov al,[variable1]
    int 20h
    variable1 db 0

  I won't tell you about data contained in PSP, you dont have to care about
  them now.
